Efficient Polynomial-Time Nested Loop Fusion with Full Parallelism

نویسندگان

  • Edwin Hsing-Mean Sha
  • Timothy W. O'Neil
  • Nelson L. Passos
چکیده

Data locality and synchronization overhead are two important factors that affect the performance of applications on multiprocessors. Loop fusion is an effective way for reducing synchronization and improving data locality. Traditional fusion techniques, however, either can not address the case when fusion-preventing dependencies exist in nested loops, or can not achieve good parallelism after fusion. This paper presents a significant addition to the current loop fusion techniques by presenting several efficient polynomial-time algorithms to solve these problems. These algorithms, based on multi-dimensional retiming, allow nested loop fusion even in the presence of outmost loop-carried dependencies or fusion-preventing dependencies. The multiple loops are modeled by a multi-dimensional loop dependence graph. The algorithms are applied to such a graph in order to perform the fusion and to obtain full parallelism in the innermost loop.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Polynomial-Time Nested Loop Fusion with Full Parallelism

Data locality and synchronization overhead are two important factors that affect the performance of applications on multiprocessors. Loop fusion is an effective way for reducing synchronization and improving data locality. Traditional fusion techniques, however, either can not address the case when fusion-preventing dependences exist in nested loops, or can not achieve good parallelism after fu...

متن کامل

Parallélisme des nids de boucles pour l'optimisation du temps d'exécution et de la taille du code. (Nested loop parallelism to optimize execution time and code size)

The real time implementation algorithms always include nested loops which require important execution times. Thus, several nested loop parallelism techniques have been proposed with the aim of decreasing their execution times. These techniques can be classified in terms of granularity, which are the iteration level parallelism and the instruction level parallelism. In the case of the instructio...

متن کامل

Maximizing Loop

Loop fusion is a program transformation that merges multiple loops into one. It is eeective for reducing the synchronization overhead of parallel loops and for improving data locality. This paper presents three results for fusion: (1) a new algorithm for fusing a collection of parallel and sequential loops, minimizing parallel loop synchronization while maximizing parallelism; (2) a proof that ...

متن کامل

Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution

Loop fusion is a program transformation that merges multiple loops into one. It is eeective for reducing the synchronization overhead of parallel loops and for improving data locality. This paper presents three results for fusion: (1) a new algorithm for fusing a collection of parallel and sequential loops, minimizing parallel loop synchronization while maximizing parallelism; (2) a proof that ...

متن کامل

Full Parallelism in Uniform Nested Loops Using Multi-Dimensional Retiming

Most scientific and DSP applications are recursive or iterative. Uniform nested loops can be modeled as multi-dimensional data flow graphs (DFGs). To achieve full parallelism of the loop body, i.e., all the computational nodes executed in parallel, substantially decreases the overall computation time. It is well known that for onedimensional DFGs retiming can not always achieve full parallelism...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • I. J. Comput. Appl.

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2003